Comparing Exact Bayesian and BIC Markov Order Classifiers
نویسندگان
چکیده
We use an exact Bayesian calculation to design classifiers that distinguish whether a finite sequence drawn from a finite alphabet is a sample path of a Markov chain of order k = 0 or of order k > 0. Three exact Bayes (EB) classifiers are derived, each corresponding to a different prior. We also include a classifier based on the Bayesian Information Criterion (BIC), a popular technique for Markov order estimation. Using thousands of randomMarkov chains of known order, we test the performance of the classifiers. In both average accuracy and ROC analyses, we find that EB classifiers with informative priors perform better than the BIC classifier, with the difference becoming strikingly large when either the size of the alphabet is large or the length of the sequence is small. We also test the classifiers on five real-world data sets and find that the EB classifications, unlike the BIC classifications, match the orders of the models with highest out-of-sample predictive accuracies.
منابع مشابه
The consistency of the BIC Markov order estimator
The Bayesian Information Criterion (BIC) estimates the order of a Markov chain (with nite alphabet A) from observation of a sample path x 1 ; x 2 ; : : :; x n , as that value k = ^ k that minimizes the sum of the negative logarithm of the k-th order maximum likelihood and the penalty term jAj k (jAj?1) 2 log n: We show that ^ k equals the correct order of the chain, eventually almost surely as ...
متن کاملHidden Markov Random Field Model Selection Criteria Based on Mean Field-Like Approximations
Hidden Markov random fields appear naturally in problems such as image segmentation, where an unknown class assignment has to be estimated from the observations at each pixel. Choosing the probabilistic model that best accounts for the observations is an important first step for the quality of the subsequent estimation and analysis. A commonly used selection criterion is the Bayesian Informatio...
متن کاملConsistency of the Bic Order Estimator
We announce two results on the problem of estimating the order of a Markov chain from observation of a sample path. First is that the Bayesian Information Criterion (BIC) leads to an almost surely consistent estimator. Second is that the Bayesian minimum description length estimator, of which the BIC estimator is an approximation, fails to be consistent for the uniformly distributed i.i.d. proc...
متن کاملToPS: A Framework to Manipulate Probabilistic Models of Sequence Data
Discrete Markovian models can be used to characterize patterns in sequences of values and have many applications in biological sequence analysis, including gene prediction, CpG island detection, alignment, and protein profiling. We present ToPS, a computational framework that can be used to implement different applications in bioinformatics analysis by combining eight kinds of models: (i) indep...
متن کاملSimulation Results for Markov Model Seletion : AIC, BIC and EDC
Higher order Markov chains, by its very definition, is the most flexible model for finitely dependent sequences of random variables. In practical settings, estimation of the dependency order is needed to identify other model parameters. Based on the penalized log-likelihood function and within nested hypotheses testing framework, several estimation alternatives have been proposed. The AIC, Akai...
متن کامل